Learning Objectives

By the end of this lesson, you will be able to:

What is data visualization? (IBM’s Definition)

Data visualization is the representation of data through use of common graphics, such as charts, plots, info-graphics, and even animations. These visual displays of information communicate complex data relationships and data-driven insights in a way that is easy to understand. This technique mainly use for

Part 1. (Data vizulization using base R)

plot function in R

For more details see [https://r-coder.com/plot-r/]

Syntax

plot(x, y, ...)
- the following arguments are optional 
for dot plot: type = 'p' (default)
for line chart: type = 'l'
to assign plot title: main = "title", a charactor field
xlab = "Name of X varaible", a charactor field
ylab = "Name of y varaible", a charactor field
xlim = limit of x values, a numerice range
ylim =  limit of y values, a numerice range

Scattor plot

A scatter chart (or a scatter plot) is a chart that shows the relationship between two quantitative variables.

  • very powerful techniques to investigate relationship or trend.
str(iris)
## 'data.frame':    150 obs. of  5 variables:
##  $ Sepal.Length: num  5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
##  $ Sepal.Width : num  3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
##  $ Petal.Length: num  1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
##  $ Petal.Width : num  0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
##  $ Species     : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...
x = iris$Sepal.Length
y= iris$Sepal.Width
plot(x, y)

x = iris$Sepal.Length
y= iris$Sepal.Width
plot(x, y, type = 'p', xlim = range(x), ylim = range(y), 
     xlab = "Sepal.Length", 
     ylab = "Sepal.Width", 
     main = "Association of Sepal.Length and Sepal.Width of iris data")

  • Color of points

For more details see here [http://www.stat.columbia.edu/~tzheng/files/Rcolor.pdf]

x = iris$Sepal.Length
y= iris$Sepal.Width
plot(x, y, col = 'red')

  • Plot character in R

For more details see here [https://www.r-bloggers.com/2021/06/r-plot-pch-symbols-different-point-shapes-in-r/]

x = iris$Sepal.Length
y= iris$Sepal.Width
plot(x, y, col = 'red', pch = 10)

x = iris$Sepal.Length
y= iris$Sepal.Width
plot(x, y, col = iris$Species, pch = 10, 
     main = "Color by Species")

  • add another y-variable when your x-varaible is same
x = iris$Sepal.Length
y= iris$Sepal.Width
y1 = iris$Petal.Width
plot(x, y, ylim = range(c(y, y1)), col = iris$Species, pch = 10)
points(x, y1, col = iris$Species, pch = 20)

  • Combining plots

In the previous example we observed the association of Sepal.Width and Petal.Width with x-variable Sepal.Length. Now let’s observe the association of those y-variables with x-variables Sepal.Length and Petal.Length.

It is very easy to combine multiple plots into one overall graph in R, using the par(mfrow = c(i, j)) .

par(mfrow = c(i, j)): combines the plots
i indicates number of rows
j indicates number of columns
  • Find the association of Sepal.Length with other varaibles
par(mfrow = c(2, 2))
#plot 1
x1 = iris$Sepal.Length
y1= iris$Sepal.Width
y2 = iris$Petal.Length
y3 = iris$Petal.Width
plot(x1, y1, xlab = "Sepal.Length",  ylab = "Sepal.Width", col = 'red', pch = 19)
plot(x1, y2, xlab = "Sepal.Length",  ylab = "Petal.Length", col = 'green', pch = 20)
plot(x1, y3, xlab = "Sepal.Length",  ylab = "Petal.Width",  col = 'black', pch = 21)

  • Figure size or window in R

It should be noted that in RStudio the graph will be displayed in the pane layout and figure size can be adjusted in r-chunk by assigning values for fig.width and fig.height.

x= rnorm(20)
y = 2*x+ 1
plot(x, y)

par(mfrow = c(1, 2))
#plot 1
x1 = iris$Sepal.Length
y1= iris$Sepal.Width
y2 = iris$Petal.Width
plot(x1, y1, ylim = range(c(y1, y2)), col = 'red', pch = 18)
points(x1, y2, col = 'blue', pch = 20)
#plot 2
x2= iris$Petal.Length
y3 = iris$Sepal.Width
y4 = iris$Petal.Width
plot(x2, y3, ylim = range(c(y3, y4)), col = 'red', pch = 18)
points(x2, y4, col = 'blue', pch = 20)

  • Space between combine figures

We can change the parameters mai, mar, tcl. Type help(par) in R-console for more details.

mai: A numerical vector of the form c(bottom, left, top, right) which gives the margin size specified in inches.
mar: A numerical vector of the form c(bottom, left, top, right) which gives the number of lines of margin to be specified on the four sides of the plot. The default is c(5, 4, 4, 2) + 0.1.

For more details see [https://datavoreconsulting.com/post/spacing-of-panel-figures-in-r/]

par(mfrow = c(2, 2), tcl=-0.01, mai=c(0.5,0.5,0.5,0.5))
#plot 1
x = iris$Sepal.Length
y1= iris$Sepal.Width
y2 = iris$Petal.Width
plot(x, y1, ylim = range(y1),  xlab = "Sepal.Length", 
     ylab = "Sepal.Width", col = "black", pch = 18)
#plot2
plot(x, y2, ylim = range(y2), xlab = "Sepal.Length", 
     ylab = "Petal.Width",  col = 'blue', pch = 18)
#plot 3 and 4
x1 = iris$Petal.Length
y3= iris$Sepal.Width
y4 = iris$Petal.Width
plot(x1, y3, ylim = range(y3), xlab = "Petal.Length", 
     ylab = "Sepal.Width", col = "red", pch = 18)
#plot4
plot(x1, y4, ylim = range(y4),  xlab = "Petal.Length", 
     ylab = "Petal.Width",  col = 'green', pch = 18)

Line chart or trace plot

x1 = iris$Sepal.Length
x2 = iris$Petal.Length
idx = 1: length(x1)
plot(idx, x1, type = "l", xlab = "", ylab = "", col = 'red', lty = 1, 
     main = "Sepal.Length vs Petal.Length comarision")
lines(idx, x2, type = "l", xlab = "", ylab = "", lty = 2, col = 'blue')

  • Let’s adjust the limit so that we can clearly see Petal.Length
x1 = iris$Sepal.Length
x2 = iris$Petal.Length
idx = 1: length(x1)
plot(idx, x1, type = "l", xlab = "", ylab = "", ylim = range(x1, x2), 
     lty = 1, col = 'red', main = "Sepal.Length vs Petal.Length comarision")
lines(idx, x2, type = "l", xlab = "", ylab = "", lty = 2, col = 'blue')

  • Define legends in R

When we are comparing multiple variables using trace plot or scatter plot, it is vary hard to identify the the visual of related variable. So, assigning legend is important in such of cases.

For more details see [https://r-coder.com/add-legend-r/]

x1 = iris$Sepal.Length
x2 = iris$Petal.Length
idx = 1: length(x1)
plot(idx, x1, type = "l", xlab = "", ylab = "", ylim = range(x1, x2), 
     lty = 1, col = 'red', main = "Sepal.Length vs Petal.Length comarision")
lines(idx, x2, type = "l", xlab = "", ylab = "", lty = 2, col = 'blue')
legend(x = "topleft",          # Position
       legend = c("Sepal.Length", "Petal.Length"),  # Legend texts
       lty = c(1, 2),           # Line types
       col = c('red', 'blue'),           # Line colors
       lwd = 2)                 # Line width

  • Legend outside the figure
# Make the window wider than taller
# Save current graphical parameters
x1 = iris$Sepal.Length
x2 = iris$Petal.Length
idx = 1: length(x1)
plt =function() {
  plot(idx, x1, type = "l", xlab = "", ylab = "", ylim = range(x1, x2), 
       lty = 1, col = 'red', main = "Sepal.Length vs Petal.Length comarision")
  lines(idx, x2, type = "l", xlab = "", ylab = "", lty = 2, col = 'blue')
}
# Save current graphical parameters
opar <- par(no.readonly = TRUE)

# Change the margins of the plot (the fourth is the right margin)
par(mar = c(5, 5, 5, 11))
plt()
legend(x = "topright",
       inset = c(-.2, 0), # You will need to fine-tune the first
                            # value depending on the windows size
       legend = c("Sepal.Length", "Petal.Length"),  # Legend texts 
       lty = c(1, 2),
       col = c('red', 'blue'), # Line colors
       lwd = 2,
       xpd = TRUE) # You need to specify this graphical parameter to

                   # put the legend outside the plot

# Back to the default graphical parameters
on.exit(par(opar))

Bar plot in R

A bar plot is a chart or graph that presents categorical data with rectangular bars with heights or lengths proportional to their corresponding values (or count). The bars can be plotted vertically or horizontally.

car_counts_by_cyl = table(mtcars$cyl)
car_counts_by_cyl
## 
##  4  6  8 
## 11  7 14
# One row, two columns
par(mfrow = c(1, 2))

# Absolute frequency barplot
barplot(car_counts_by_cyl, main = "Absolute frequency",
        col = rainbow(3))

# Relative frequency barplot
barplot(prop.table(car_counts_by_cyl) * 100, main = "Relative frequency (%)",
        col = rainbow(3))

 Boston311_2023_data =read.csv("https://data.boston.gov/dataset/8048697b-ad64-4bfc-b090-ee00169f2323/resource/e6013a93-1321-4f2a-bf91-8d8a02f1e62f/download/tmp518q5snq.csv")
  • Which neighborhood has maximum number of complaints of type “Parking Enforcement”?
library(stringr)
library(dplyr)
Boston311_2023_data$Parking_Enforcement_status <- str_detect(Boston311_2023_data$case_title, regex("\\bParking Enforcement\\b"))
Parking_Enforcement_by_nbd <- Boston311_2023_data %>%
  group_by(neighborhood) %>%
  summarise(nbd_count_Parking_Enforcement = n()) %>%
  arrange(desc(nbd_count_Parking_Enforcement))
head(Parking_Enforcement_by_nbd, 10)
top_10_nbd = Parking_Enforcement_by_nbd[1:10, ]
barplot(names = top_10_nbd$neighborhood, height = top_10_nbd$nbd_count_Parking_Enforcement,
 col = rainbow(10), las = 2)

#las = 1, group names printed horizontally
#las = 2, group names printed vertically
par(mar, mgp, las)
par(mar=c(5.1, 4.1, 4.1, 2.1), mgp=c(3, 1, 0), las=0)
par sets or adjusts plotting parameters. Here we consider the following three parameters: margin size (mar), axis label locations (mgp), and axis label orientation (las).
mar – A numeric vector of length 4, which sets the margin sizes in the following order: bottom, left, top, and right. The default is c(5.1, 4.1, 4.1, 2.1).
mgp – A numeric vector of length 3, which sets the axis label locations relative to the edge of the inner plot window. The first value represents the location the labels (i.e. xlab and ylab in plot), the second the tick-mark labels, and third the tick marks. The default is c(3, 1, 0).
las – A numeric value indicating the orientation of the tick mark labels and any other text added to a plot after its initialization. The options are as follows: always parallel to the axis (the default, 0), always horizontal (1), always perpendicular to the axis (2), and always vertical (3).

Horizontal barplot

par(mar = c(4, 16, 2, 2))
top_10_nbd = Parking_Enforcement_by_nbd[1:10, ]
barplot(names = top_10_nbd$neighborhood, height = top_10_nbd$nbd_count_Parking_Enforcement,
 col = rainbow(10), horiz = TRUE, las = 1)

Barplot for continuous variable

var1 = iris$Sepal.Length
cut_off = c(0, 5, 6, 7 , 8)
catgory = c("low", "low_mid", "high_mid", "high")
Sepal_Len_cat1 = cut(var1, breaks = cut_off, labels = catgory)
iris_new = cbind(iris, Sepal_Len_cat1)
barplot(table(iris_new$Sepal_Len_cat1), col = rainbow(4), legend.text = levels(iris_new$Sepal_Len_cat1))# With Legend

Grouped barplot in R

# Variable am to factor
am = mtcars$am
am <- factor(am)
# Change factor levels
levels(am) <- c("Automatic", "Manual")
summary_data <- tapply(mtcars$hp, list(cylinders = mtcars$cyl,
                      transmission = am),FUN = mean, na.rm = TRUE)
summary_data
##          transmission
## cylinders Automatic   Manual
##         4  84.66667  81.8750
##         6 115.25000 131.6667
##         8 194.16667 299.5000
par(mar = c(5, 5, 4, 10))

barplot(summary_data, xlab = "Transmission type",
        main = "Horsepower mean",
        col = rainbow(3),
        beside = TRUE,
        legend.text = rownames(summary_data),
        args.legend = list(title = "Cylinders", x = "topright",
                           inset = c(-0.20, 0)))

Stacked barplot in R

par(mar = c(5, 5, 4, 10), las = 0)
barplot(summary_data,
        main = "Horsepower mean",
        xlab = "Transmission type", ylab = "HP mean",
        col = c('red', 'blue', 'green'),
        legend.text = rownames(summary_data),
        beside = FALSE, # Stacked bars (default)
        args.legend = list(title = "Cylinders", x = "topright",
                           inset = c(-0.3, 0)))

Pie-chart

A pie chart is used to represent data in numerical proportions. Pie chart in R is created using pie() function.

# cyl-wise distribution of data using pie-chart
count_cars <- mtcars %>% 
  group_by(cyl) %>%
  summarise(count = n())

car_type <- paste(count_cars$cyl, "cyl")
count <- count_cars$count
 # calculating percentage participation
 perc <- round(count/sum(count)* 100, 2)
# add frequency or proportion to country names to create labels
labels <- paste(car_type, perc,'%')
pie(count, labels = labels,radius = 1, col = hcl.colors(n = 3, palette = 'ag_Sunset'),  border = 'gray', main = "Pie chart in R")

Straight line in R

  1. horizontal line in R
  • Create a empty box
# plot function is used to plot
# the data type with "n" is used to remove the plotted dots
# to remove the plotted data
plot(1, type = 'n', xlab = "",
     ylab = "", xlim = c(0, 5), 
     ylim = c(0, 5))
abline(h = 2, col = 'red')

  1. vertical line in R
  • Create a empty box
# plot function is used to plot
# the data type with "n" is used to remove the plotted dots
# to remove the plotted data
plot(1, type = "n", xlab = "",
     ylab = "", xlim = c(0, 5), 
     ylim = c(0, 5))
abline(v = 2, col = 'red')

  1. Horizontal and vertical line in R
# plot function is used to plot
# the data type with "n" is used to remove the plotted dots
# to remove the plotted data
plot(1, type = "n", xlab = "",
     ylab = "", xlim = c(0, 5), 
     ylim = c(0, 5))
abline(h = 2.5, v = 2, col = 'red')

  1. Line with slope and intercept in R
# plot function is used to plot
# the data type with "n" is used to remove the plotted dots
# to remove the plotted data
plot(1, type = "n", xlab = "",
     ylab = "", xlim = c(0, 5), 
     ylim = c(0, 5))
abline(a = 0, # Intercept
       b = 1, col = 'red')  # Slope
abline(a = 5, # Intercept
       b = -1, col = 'blue')  # Slope

Histogram

Histogram is the most widely used graph to represent quantitative (or numerical) data mostly for the continuous in nature.

  • This type of graph is useful to make an idea of the shape or distribution of the given data
  • It provides an idea of the density of the underlying distribution of the data,
  • It is often useful for density estimation,

Syntax

hist(x,....)
hist(x, breaks = "Sturges",
     freq = NULL, probability = !freq,
     include.lowest = TRUE, right = TRUE,
     density = NULL, angle = 45, col = NULL, border = NULL,
     main = paste("Histogram of" , xname),
     xlim = range(breaks), ylim = NULL,
     xlab = xname, ylab,
     axes = TRUE, plot = TRUE, labels = FALSE,
     nclass = NULL, warn.unused = TRUE, …)

Examples 1 (Histogram)

  • Histogram with Frequency
hist(iris$Sepal.Length, breaks = 20, col = 'gray', probability = TRUE)

  • Histogram with relative frequency
hist(iris$Sepal.Length, breaks = 15, xlab = 'Sepal.Length', ylab = 'Relative Frequency',probability = TRUE, col = 'gray', main = "Histogram of Sepal.Length of Iris data")

Histogram with two variables

par(mfrow = c(2, 2))
x <- iris$Sepal.Length    # First group
y <- iris$Petal.Length    # Second group

hist(x,  main = "Histogram of Sepal.Length")
hist(y,  main = "Histogram of Petal.Length")

# Combine plot
hist(x, xlim = c(0, 8),ylim = c(0, 50),  main = "Histogram of Two variables")
hist(y, add = TRUE, col = rgb(1, 0, 0, alpha = 1))

Add kernel density to histogram (non-parametric curve)

par(mfrow = c(1, 2))
x <- iris$Sepal.Length    # First group
y <- iris$Petal.Length    # Second group

hist(x, probability = TRUE,  main = "Histogram of Sepal.Length")
lines(density(x), lwd = 2, col = 'red')
hist(y,  probability = TRUE, main = "Histogram of Petal.Length")
lines(density(y), lwd = 2, col = 'red')

Add normal density to histogram

x <- iris$Sepal.Length    # First group
y <- iris$Petal.Length    # Second group
hist(x, ylim = c(0, 0.5), probability = TRUE, 
     main = "Histogram of Sepal.Length")
x_val = seq(min(x), max(x), length.out = 100)
f_val = dnorm(x_val, mean = mean(x), sd = sd(x))
lines(x_val, f_val, lwd = 2, col = 'red')

Box plot in R

Box plots (Chambers 1983) are an excellent tools for detecting and illustrating location and variation changes between different groups of data.

  • It allows us to summarize the main characteristics of the data such as position, dispersion, skewness
  • It is a very useful tool to identify potential data outliers.
boxplot(x, ylab = "Sepal.Length")

  • Horzontal box plot
boxplot(x, xlab = "Sepal.Length", horizontal = TRUE)

boxplot(x, xlab = "Sepal.Length", horizontal = TRUE)
stripchart(x, method = "jitter", pch = 19, add = TRUE, col = "red")

  • When the data have no outliers, two values from end whiskers represents the min and max values, similarly boundaries of the box are Q1 and Q2, and the middle value represents the median.

Simple but widely used idea for detecting an outlier

IQR = Q3 - Q1
Usual low value, L = Q1 - 1.5*IQR
Usual high value, U = Q3 + 1.5*IQR
Any value outside of the range between L and U considered as outlier
x <- rnorm(50, 20, 5)
x1 <- c(-4, -7, 0, 50, 55) # add few extreme data points
x <- c(x, x1)
boxplot(x)

Q1 <- quantile(x, prob = 0.25)
Q3 <- quantile(x, prob = 0.75)
IQR <- Q3 - Q1
L <- Q1 - 1.5*(IQR)
U <- Q3 + 1.5*(IQR)
boxplot(x, horizontal = TRUE, main = "Detection of outlier uising boxpolt ")
abline(v = L, col = 'red')
abline(v = U, col = 'blue')

Boxplot of multiple varaibles

boxplot(Sepal.Length ~ Species, data = iris, col = rainbow(3), horizontal = FALSE)

Scattor plot with regression line

x = iris$Sepal.Length
y = iris$Petal.Length
plot(x, y, pch = 19, col = "gray52")


# Linear fit
abline(lm(y ~ x), col = "orange", lwd = 3)

# Smooth fit
lines(lowess(x, y), col = "blue", lwd = 3)

# Legend
legend("topleft", legend = c("Linear", "Smooth"),
       lwd = 3, lty = c(1, 1), col = c("orange", "blue"))

Scatterplot matrix in R

It is a pairwise scatter plot, that shows the pairwise association between variables.

#numerical_df <- subset(iris, select = c(Sepal.Length, Sepal.Width,Petal.Length,Petal.Width))
#pairs(numerical_df)
pairs(~ Sepal.Length + Sepal.Width + Petal.Length + Petal.Width, data = iris)

  • Pairs plot by category
pairs(~ Sepal.Length + Sepal.Width + Petal.Length + Petal.Width, col = iris$Species, data = iris)

Part 2. (Data vizulization in R using ggplot)

ggplot2 is one of the most used packages for data visualization in R and it builds plots in layers.

Scatter plot

# install.packages("ggplot2")
library(dplyr)
library(ggplot2)
iris %>%
ggplot() +
  aes(x = Sepal.Length, y = Sepal.Width) + 
  geom_point(size=2, shape=10) 

#geom_point(aes(size=Sepal.Length))

Label points in the scatter plot

ggplot(iris, aes(x = Sepal.Length, y = Sepal.Width)) +
  geom_point(aes(colour = Species)) + # Points and color by group
  scale_color_discrete("Type") +  # Change legend title
  xlab("Sepal.Length") +              # X-axis label
  ylab("Sepal.Width")  +             # Y-axis label
  theme(axis.line = element_line(colour = "black", # Changes the default theme
                                 size = 0.24))
## Warning: The `size` argument of `element_line()` is deprecated as of ggplot2 3.4.0.
## ℹ Please use the `linewidth` argument instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

Title

ggplot(iris, aes(x = Sepal.Length, y = Sepal.Width)) +
  geom_point() +
  ggtitle("Scattor plot in R") +
  theme(plot.title = element_text(hjust=0.5)) + # Assign title on center
  geom_point(aes(color = Species)) + # Points and color by group
  #scale_color_discrete("type") +  # Change legend title
  xlab("Sepal.Length") +              # X-axis label
  ylab("Sepal.Width")  +             # Y-axis label
  theme(axis.line = element_line(colour = "red",size = 0.5)) # Changes the default theme (xy-axes)

  • abline()
ggplot(iris, aes(x = Sepal.Length, y = Sepal.Width)) +
  geom_point() +
  geom_abline(intercept = 3, slope =  0  ) +
  ggtitle("Scattor plot in R") +
  theme(plot.title = element_text(hjust=0.5)) + # Assign title on center
  geom_point(aes(colour = Species)) + # Points and color by group
  scale_color_discrete("Species") +  # Change legend title
  xlab("Sepal.Length") +              # X-axis label
  ylab("Sepal.Width")  +             # Y-axis label
  theme(axis.line = element_line(colour = "black", # Changes the default theme
                                 size = 0.01))

- Remove the grids

ggplot(iris, aes(x = Sepal.Length, y = Sepal.Width)) +
  geom_point() +
  ggtitle("Scattor plot in R") +
  theme(plot.title = element_text(hjust=0.5)) + # Assign title on center
  theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank())+
  #theme_void() + #remove background
  #theme_classic() +#remove background
  geom_point(aes(colour = Species)) + # Points and color by group
  #scale_color_discrete("Species") +  # Change legend title
  xlab("Sepal.Length") +              # X-axis label
  ylab("Sepal.Width") +             # Y-axis label
  theme(axis.line = element_line(colour = "black", # Changes the default theme
                                 size = 0.5))

Line plots in R

# Change the line type
ggplot(data=iris, aes(x = Sepal.Length, y = Sepal.Width)) +
  geom_line(linetype = "dashed")

# Change the line type
ggplot(data=iris, aes(x = Sepal.Length, y = Sepal.Width)) +
  geom_line(linetype = "solid")+
  geom_point()

# Change the line type
ggplot(data=iris, aes(x = Sepal.Length, y = Sepal.Width)) +
  geom_line(aes(colour = Species))+
  geom_point(aes(colour = Species)) + # Points and color by group
  #scale_color_discrete("Species") +  # Change legend title
  xlab("Sepal.Length") +              # X-axis label
  ylab("Sepal.Width")                 # Y-axis label

Bar plot in R

df = mtcars %>% 
  group_by(cyl)%>% 
  summarise(count = n())
# Basic barplot
ggp <- ggplot(data=df, aes(x = cyl, y = count,  fill = factor(cyl))) +
  geom_bar(stat="identity", width=1) +
  theme_minimal()
ggp

  • Horizontal plot
# Don't map a variable to y

ggp <- ggplot(mtcars, aes(x=factor(cyl), fill = factor(cyl)))+
  geom_bar() +
  theme_minimal()
ggp

top_10_nbd = Parking_Enforcement_by_nbd[1:10, ] 

ggp <- ggplot(top_10_nbd, aes(y=neighborhood, x = nbd_count_Parking_Enforcement, fill = neighborhood ))+
  geom_bar(stat="identity") +
  scale_colour_manual(name = "neighborhood")+
   xlab("Parking enforcement count by Neighborhood") +  # X-axis label
  ylab("Neighborhood") +                # Y-axis label
  theme_minimal()
ggp

  • Order values in plot
top_10_nbd = Parking_Enforcement_by_nbd[1:10, ] 

ggp <- ggplot(top_10_nbd, aes(y=reorder(neighborhood, nbd_count_Parking_Enforcement), x = nbd_count_Parking_Enforcement, fill = neighborhood ))+
  geom_bar(stat="identity") +
  scale_colour_manual(name = "neighborhood")+
  theme_void()
ggp

Sub-divided bar plot and pie-chart

df = mtcars %>% 
  group_by(cyl)%>% 
  summarise(count = n())

df$cyl = as.factor(df$cyl)
# Basic barplot
ggp <- ggplot(data=df, aes(x ='', y = count, fill = cyl)) +
  geom_bar(stat="identity", width=0.7) +
  theme_minimal()
ggp

  • Pie chart
ggp <- ggplot(data=df, aes(x = '', y = count, fill = cyl)) +
  geom_bar(stat="identity", width=0.7) +
  coord_polar("y", start=0)
ggp

df$perc = round(df$count/sum(df$count),4) *100
ggp <- ggplot(data=df, aes(x = '', y = perc, fill = cyl)) +
   geom_col() +
  geom_text(aes(label = paste(perc, '%')), color = rep("white", 3),
            position = position_stack(vjust = 0.5)) +
  coord_polar(theta = "y") + 
  theme_void()
ggp

Histogram in R

# Basic histogram
ggplot(iris, aes(x=Sepal.Length)) + 
  geom_histogram()
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

# Change the width of bins
ggplot(iris, aes(x=Sepal.Length)) + 
  geom_histogram(binwidth=0.3)

# Change colors
p <-ggplot(iris, aes(x=Sepal.Length)) + 
  geom_histogram(color="black", fill="gray")+
  theme_void()
p
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Add mean line and density plot on the histogram

# Add mean line
p + geom_vline(aes(xintercept=mean(Sepal.Length)),
            color="blue", linetype="dashed", size=1)
## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

# Histogram with density plot
ggplot(iris, aes(Sepal.Length)) + 
 geom_histogram(aes(y= ..density..), colour="black", fill="white")+
 geom_density(alpha=.1, fill="red") + #transparency parameter
theme_minimal()
## Warning: The dot-dot notation (`..density..`) was deprecated in ggplot2 3.4.0.
## ℹ Please use `after_stat(density)` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

# Change histogram plot line colors by groups
ggplot(iris, aes(x=Sepal.Length, color=Species)) +
  geom_histogram(fill="gray")
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

# Change histogram plot line colors by groups
ggplot(iris, aes(x=Sepal.Length,fill=Species, color=Species)) +
  geom_histogram(position="identity", alpha=0.5)
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Use facets

stat_summary <- iris %>% 
  group_by(Species) %>% 
  summarise(mean_SepL = mean(Sepal.Length), median_SepL = median(Sepal.Length))

stat_summary <- data.frame(Species = rep(stat_summary$Species, 2), stat = c(stat_summary$mean_SepL, stat_summary$median_SepL), value = rep(c('mean', 'median'), each = 3))

p <- ggplot(iris, aes(x=Sepal.Length))+
  geom_histogram(color="black", fill="steelblue")+
  facet_grid(Species ~ .) +
  geom_vline(data = stat_summary, mapping = aes(xintercept = stat, color = value))
p
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Box plot

# Basic box plot
p <- ggplot(iris, aes(x=Sepal.Length, y=Species)) + 
  geom_boxplot()
p

# Rotate the box plot
p + coord_flip()

# Notched box plot
ggplot(iris, aes(x=Sepal.Length, y=Species)) + 
  geom_boxplot(notch=TRUE)

# Change outlier, color, shape and size
ggplot(iris, aes(x=Sepal.Length, y=Species)) + 
  geom_boxplot(outlier.colour="red", outlier.shape=8,
                outlier.size=4)

Change box plot line colors

Box plot line colors can be automatically controlled by the level variable :

# Change box plot line colors by groups
p<-ggplot(iris, aes(y=Sepal.Length, x=Species, color = Species)) +
  geom_boxplot()
p

Change box plot fill colors

# Change box plot colors by groups
p<- ggplot(iris, aes(y=Sepal.Length, x=Species,  fill= Species)) +
   geom_boxplot()
p